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We extend the definition of the Lerch distribution to the set of nonnegative integers 
for greater applicability to modeling count data. We express its properties in terms 
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1 Introduction 

X. 

Zornig and Altmann ( 1995| ) introduced a three-parameter, discrete univariate 



Lerch distribution that is defined on t he set of positive i ntegers and is a gen - 
eralization of related Zip f ( Zip! . 1949h . Zipf-Mandelbrot ( Mandelbro t. 1983), 



and Good ()GoodL I1953T ) distributions. These distributions have been used 
as models in ecology, linguistics, information science, and statistical physics. 
This generalization was made possible by realizing that the probability mass 
functions (p.m.f.s) p x of the Zipf, Zipf-Mandelbrot, and Good distributions 
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have the form of the terms in the Riemann zeta (p x ~ x~ s ), generalized zeta 
[p x ~ (x + v)~ s ], and Jonquiere's (p x ~ z x x~ s ) functions, respectively, which 
parametrically are special cases of Lerch's transcendent [p x ~ z x (x + v)~ s }. 
The Lerch distribution is expected to provide a better fit in problems arising 
in these disciplines because its extra parameters can better capture the shape 
of the data, which was demon strated with the zero-trunc ated Lerch distribu- 
tion applied to surname data ( Zornig and Altmann . 19951 ). However, in many 
other problems the random variable can have value zero, which prompted us to 
extend the definition of the Lerch distribution to the set of nonnegative inte- 
gers. This extension is in fact more natural as a definition, because the series 



function of the Lerch distribution is Lerch's transcendent, 



as an infinite sum with an index running from zero ((Magnus et al 



which is define d 
1963). 



Tr uncation of Lerch dis t ributi on, including the zero-truncated case studied 
by Zornig and Altmann (1995), can then be easily carried out if needed by 
mani pulating the arg uments of Lerch's transcendent (see auxiliary informa- 



tion ([Aksenovl . I2004I )). Moreover, we show that all functions of the Lerch 



distribution can be expressed in closed form in terms of Lerch's transcendent. 
We studied a novel and very effective converge nce acceleration techn ique for 



computation of Lerch's transcendent elsewhere ( Aksenov et ah , 2003|) 



we im- 



plemented it in freely available C and Mathematica code ((Aksenov . 2004 ) 



Lerch's transcendent is also availabl e in most major computation software en- 
vironments including Mathematica ( Wolfram , 1996f ). This makes calculations 
with the Lerch distribution easier in practice. 



Moti yation for using the Lerch distr ibution includes the following considera- 
tions. iKulasekera and Tonkvn (1992) proposed to use the Good distribution to 
model survival processes because its hazard function can be constant, mono- 
tonically decreasing or monotonically increasing depending on the value of 
one parameter, and dispersal processes because its variance can be greater, 
equal, or less than the mean. This allows modeling with a single distribution 
instead of the combination of three, negative binomial, Poisson, and binomial, 
as has often been the case in the ecologi cal literature, and instead of com- 
pound, generalized Poisson (|ConsuiE98l . and adjust ed genera l ized Poisson 
(iGupta et al. , 19961 ) , and other contagious distributions ( Nevman , 19391 : Feller , 
1943| ). However, since the Good distribution is not defined for the zero class, 
it is of no use for certain count problems where the zero class is important. We 
show that the general Lerch distribution that includes zero is also flexible with 
respect to its hazard function and variance-to-mean ratio, and can therefore 
be successfully applied in those cases. An additional advantage over compound 
and adjusted distributions is the use of a single distributional form over the 
whole range of data. We demonstrate the utility and better goodness-of-fit 
of the Lerch distribution with an example of under-dispersed data for num- 
bers of sea-urchin sperm in eggs. Calculations for this example are collected 
in a Mathematica notebook that is available from the corresponding author. 
They have been performed with a specially written Mathematica package Ler- 
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chDistribution.m that extends the scope of standard statistical functions of 
the Mathematica software system to include the Lerch distribution and adds 
the capability to fit Lerch par ameters to data. The package is freely available 
for download (|Aksenovl . 120041 ) . 



2 Definition and the distribution functions 

The p.m.f. of the Lerch distribution is given by 

cz x 

Pr{X = x)= p x = — — - 
[V + x) s 



where c is the normalization constant. Support of the general Lerch distribu- 
tion is the set of nonnegative integers x — 0, 1, .... In order for the distribution 
to be proper, all probabilities Pr(X = x) (1) have to sum to one: 



]T Pr(X = x) = l (2) 

x=0 



and so we obtain that the normalization constant is Lerch's transcendent 
c- 1 = Q(z,s,v) (3) 



which is defined by the following series ( Magnus et al. . 1966) 



§(z,s,v) = ]T |z|<l, v^0,-l,... (4) 



Also the p.m.f.s (1) have to be positive which is ensured first of all by requiring 
z > 0. Likewise, we have to require that v > 0, because for v < the p.m.f. 
alternates in sign for < x < [\v |] + 1 for integer s and is complex- valued for 
real s, where [■] signifies taking the integer part. 

The cumulative distribution function (c.d.f.) of the Lerch distribution is de- 
fined as 

F(x) = Pr(X <x) = l- + (5) 
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which is obtained using the functional relation ( Magnus et al. . 1966): 



$(z, s, v) = z m $(z, s,m + v) + 



m-l z x 



( X + V Y 



(6) 



While the c.d.f. of a discrete distribution by definition accepts values only 
at a countable set of nonnegative integers x, we can consider a continuous 
c.d.f. of the same form as (5) defined on a real line [0, oo); this continuous 
c.d.f. of course agrees with our discrete c.d.f. at the nonnegative integers. The 
definition in (5) implies that this c.d.f., at each integer x, is continuous to the 
right and discontinuous to the left of x. Thus the qth quantile of the Lerch 
distribution is the value x q such that 



F(x - 1) < x q < F(x) 



(7) 



The solution of this inequality is uniquely defined with probability one, and can 
be obtained numerically. This also gives an algorithm for random nu mber gen- 
eratio n from the Lerch distribution (see the auxiliary information (Aksenov 

ET 



The hazard function is defined as 



h(x) 



Pr{X = x) _ 1 
Pr(X>x) (v + x) s &(z, s, v + x) 



The probability generating function (p.g.f.) for \y\ < 1 is 



G(y) = Y,y x Pr(X 



x) 



2 = 



$(yz,s,v) 
${z,s,v) 



(9) 



The moment generating function (m.g.f.) is defined by 



${z,s,v) 



(10) 



The moments are then obtained as coefficients of t he Taylor series expansion 
of (10) in terms of t (see the auxiliary information ( Aksenov . E( 
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3 Some properties 



3.1 Unimodality 



The Lerch distribution, defined in (1), is strongly unimodal if s < and v > 1. 
These conditions can be derived as follows. 

The necessary and sufficient condition for the p.m.f. p x of any disc r ete di stri- 



ine necessary ana sumcient condition tor tne p. m.t. p T ot any disc r ete di s 
bution to be strongly unimodal was shown bv lKeilson and Gerber ( 197lh : 



Pi > Px-lPx+1 (11; 



Substituting the p.m.f. of the Lerch distribution (1) into the Equation (11) we 
obtain 



1 



(v + x)' 



> 1 (12) 



Inequality (12) can only be satisfied if s < 0. Furthermore, the condition 



leads to v > 1 for all x = 0, 1, . . ., which completes the conditions for strong 
unimodality. 

The mode of the Lerch distribution, defined in (1), is at x = [l/(^ 1 / s — 1) — 
v } + 1 = [1/(1 — z~ l l s ) — v], where [•] signifies taking the integer part, provided 
the conditions of strong unimodality are satisfied. The mode can be obtained 
by resolving the general inequalities for a discrete distribution with the p.m.f. 
p x that has a mode at x = a: 



Px+l < Px, 
Px > Px-1, 



allx > a 
all x < a 



(13) 



(See auxiliary information ([Aksenovl . 120041 ) for complete derivation.) 



3.2 Monotonicity of the hazard function 



Depending on the sign of parameter s, the hazard function can be constant, 
monotonically decreasing or monotonically increasing. This can be shown as 
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follows. First, take the derivative of h(x) with respect to x: 

dh(x) s $(z,s + l,v + x)(v + x) — $(z,s,v + x) 
dx h(x) $(z,s,v + x)(v + x) 



(14) 



Now, represent Lerch's transcendents in the numerator of (14) as infinite sums 
and join these into a single sum, and write the numerator as 

~ Z ^ (v + x + k)^ (15) 



which is evidently negative. Finally, as h(x) and the denominator in (14) are 
positive, the sign of the derivative is determined by the sign of s: hazard 
function will be constant if s = 0, monotonically decreasing if s > 0, and 
monotonically increasing if s < 0. 



3.3 Variance-to-mean ratio 



In order to calculate the variance-to-mean ratio, we need first and second mo- 
ments. Mean and variance are calculated by differentiating the m.g.f. in (10) 
with respect t o t and using relations hips between central and uncorr ected mo- 
ment s as per ( Stuart and Ordill994[ ) (see the auxiliary information ( Aksenovl . 



20041 ) for complete derivation). The mean is given by 

S, V 



<&(z,s — l,v) , , 

* = L , _ .A -v (16) 



The variance is given by 

a =^ 2 = (v + ^) + — r 17 



Now we prove that for a suitable selection of parameters of the Lerch distri- 
bution its variance can be greater, equal or less than its mean, thus making 
it useful for analysis of over- and under-dispersed data. Consider the Taylor 
series expansion of the mean and the variance of the Lerch distribution given 
by Equations (16) and (17), respectively, around z = to order z 1 and z 2 , 
respectively: 



n = z T , ; \ a +0(z 2 
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,2s 



a 2 = z- 



+ z 2 2(v + 1) 2- 



+ 0(z 3 ) 



(18) 



(l + w) s ' v ' ' V (2 + w) s (1 +f) 2s 7 
Now calculate the ratio of variance to mean to order z 1 using Equations (18): 



a 



— = 1 + z2(v + 1) 



2 + v 



1 + v 



(19) 



Clearly, the ratio of variance to mean will be less, equal or greater than unity, 
depending on the sign of the expression in square brackets in Equation (19). 
This expression can change its sign only if s < 0. Thus, we obtain that for any 
v > and z > 0, which is small in Taylor's sense, there is s < such that 



— <,=,> 1 



iff 



s <,=,> 



log 2 



log (l + ^ 



(20) 



Furthermore, based on numerical experiments, we conjecture that for s > 
— log2/log(l + l/(f 2 + 2v)), the variance-to-mean ratio will remain greater 
than unity for larger deviations of z from zero. However, for s < — log 2/ log(l+ 
l/(v 2 + 2v)), the ratio remains less than one only for small deviations of z from 
0; for larger deviations in z the ratio becomes greater than unity. 



4 Example 



Consider fertili zation of sea-urchin eggs by sperm (rows 1 and 4 in Table 1) 
(Morgan, 1975| ). which is an example of underdispersed data that is fitted 
well by the Lerch distribution but not fitted at all by the generalized Poisson 
distribution. Experiments are done by examining the number of fertilized eggs 
in a batch at two different times, 40 and 180 sec. Poisson model failed to fit the 



data, suggesting a nonrandom pattern of fertilization. Janardan et al. ( 19791 ) 
fit the generalized Poisson disrtribution that has the p.f. ( Consull .' 1989f ) 



x— 1 _— 0— x\ 



x 



Pa 



0,1,2,... 



x > 



when A < 



(21) 



where [•] signifies the integer part operation, with parameters 9 = 1.0077 and 
A = -0.3216 for 40 sec. and 9 = 2.4654 and A = -1.0893 for 180 sec. (see 
rows 2 and 5 in Table 1, respectively, for calculated frequencies). However, 
since A < for both data sets, the generalized Poisson distribution should 
be truncated at 3 and 2, respectively. Clearly the truncation points being less 
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than the maximum observed class indicate that generalized Poisson distribu- 
tion fails to fit the whole range of data. We fit the Lerch distribution by min- 
imizing the Pearson X 2 statistic. The best-fit parameters are z = 0.00773867, 
s = -8.26894, v = 1.11633 for 40 sec. and z = 0.0835808, s = -1.15174, 
v = 0.00468234 for 180 sec. (see rows 3 and 6 in Table 1 for calculated fre- 
quencies). Since there are only four nonzero classes in both data sets, the Chi- 
squared distribution would have zero d.f., and thus we use the sum of squared 
deviations as a goodness-of-fit measure. For 40 sec. the sum is 0.000372774 
and for 180 sec. it is 0.000162184. 



The above example demonstrates that the Lerch distribution can provide a 
useful model with better good ness-of-fit for ov er- and under-dispersed data 
(see the auxiliary information (jAksenovl . 120041 1 for additional examples), for 
which modifications of the Poisson distribution or the reduced Lerch (Good) 
model perform less well or even fail, as well as for some rank-abundance eco- 
logical data, for which the reduced Lerch (Zipf-Mandelbrot) model does not 
fit. The ultimate reason is that the Lerch distribution is a generalization of the 
Good and Zipf-Mandelbrot distributions (which were originally proposed as 
models for these problems) and is endowed with greater flexibility to accomo- 
date the odd distributional shapes often observed in real problems. A comment 
on the method of parameter estimation is in order. We found in practice that 
the most convenient and accurate method of parameter estimation is mini- 
mization of the Pearson X 2 statistic, which works well for data even with a 
few frequency classes. The m oment and max imum likelihood methods derived 
in the auxiliary information (|Aksenovl . 120041 ) are more difficult to use in such 
situations because of higher variance of sample moments and expectations in 
a small sample setting. An added difficulty is the slow convergence of moment 
and maximum likelihood equations that involve Lerch's transcendent. These 
difficulties are not specific to the Lerch distribution and are in fact common 
for other distributions that are based on special functions. However, we found 
that these methods perform well with more rich data sets (not shown). 
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Col. Counts 





1 


2 


3 


4 


40 sec. 


1 Observed 


28 


44 


7 


1 





2 Gen. Poisson 


29.2046 


40.5931 


10.2044 


0.0236894 




3 Lerch 


28.0876 


43.0737 


8.17627 


0.631919 


0.0295335 


180 sec. 


4 Observed 


2 


81 


15 


1 


1 


5 Gen. Poisson 


8.49748 


62.2665 


26.5388 






6 Lerch 


1.99482 


80.7898 


14.9625 


1.99311 


0.23192 



Table 1 

Distribution of sea-urchin sperm on eggs at two different times of exposure. The 
observations are frequencies of sperm on eggs, frequen cy classe s are numbers of 
sperm found on an egg. Observed frequencies come from ( Morgan! 19751 ). Predicted 
frequencies come from the best-fit generalized Poisson and Lerch distributions. See 
text for details. 
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1 Mode 

The p.m.f. of the Lerch distribution is given by 
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where c is Lerch's transcendent 

c-^c&Ow) (2) 



which is defined by the following series ( Magnus et al. . 19661 ) 



z n 



Proposition 1 The mode of the Lerch distribution, defined in (1), is at x = 
[^/{z 1 ^ — 1) — v) + 1 = [1/(1 — 2 _1 / s ) — v), where [■] signifies taking the 
integer part, provided the conditions for strong unimodality as per main text 
are satisfied. 

Proof. A discrete distribution with the p.m.f. p x has a mode at x — a if the 
following inequalities hold: 



Px+i < Px, all x > a 

Px > Px-i, all x < a (4) 
Using Equation (1) we rewrite these inequalities as 

/ v + x \ s /v + x — l\ s , , 

z{ ——) < 1, z(— >1 5 

\v + x + lj V v + x J - w 



Provided that the conditions for unimodality are satisfied, these inequalities 
can be resolved to obtain the mode. 

First, these inequalities can be rearranged as (we use the fact that raising 
quantities that are less than one to negative power reverses the inequality 
sign) 

— < z x ' s - 1, — > 1 - z- x ' s 
v+x v+x 



Considering that < z < 1, s < and v > 1, we see that both the left 
and right hand sides of both inequalities are positive. Rearranging we obtain 
further 

1 1 
x > —-. V, x < —, V 

- z l/s _l ' - I _ z -X/b 



We can see that the (real) endpoints of the half-intervals defined by these in- 
equalities bracket the mode which is a unique integer, because their difference 



2 



is exactly one: 

1 1 z 1 ' 3 



i - z-y* z 1 /* - i z 1 /' - 1 z l i s 



Taking the integer part of either of the endpoints we obtain the expressions 
for the mode.n 



2 Random number generation 



Random number generation from a discrete distribution can in principle be 
accomplished with a number of techniques ( Devrovel 1986). We propose a 
direct way to obtain the discrete Lerch random variate using inversion by 
correction. In the inversion by correction method, one starts with a c.d.f. 
H(x) that is "close" to the true c.d.f Fix) 



F(x) 



S, V + X + 1) 

$(z,s,v) 



(6) 



and there is an "easy" algorithm to generate a random number from H(x) 
(e.g., straightforward inversion). Given H(x), one samples from H(x) and then 
corre cts by sequential search to obtain a random variate from F(x) ([Devrovel . 
1986). Obtaining a first estimate of x in this way serves to shorten the ordinary 
sequential search. Additional savings of computational time can be obtained if 
H(x) can be constructed in such a way that F(x) and H(x) are stochastically 
ordered, i.e. H(x) < F(x) or H{x) > F(x) for all x. In this case we know 
beforehand if we have to search up or down. In the case of the Lerch c.d.f., it 
is possible to construct a stochastically ordered H( 



x+1 $(z,s,v + l) 



H(x) = l-z x+i ^ ' - ' (7) 



H{x) and F(x) are stochastically ordered because $(z, s, v+1) > <&(z, s, v +x+ 
1) and thus H(x) < F(x) if s > 0, and because $(z, s, v + 1) < s, v+x + 1) 
and thus H(x) > F(x) if s < 0, for all x as x — > oo. 

The algorithm to generate a random Lerch variate is then as follows. 

The time it takes to generate a random number is on average proportional to 
the expected number of calculations of F, E(C), which in turn is related to 
the number of comparis ons during the search and hence to the expectation 
E(x), E(C) = 1 + E(x) (jDevrovel .1l986h. 
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Inversion by correction: Lerch distribution^, s, v) 

(1) Generate a uniform (0, 1) random variate U. 

(2) Calculate a first guess x by inversion and truncation of a continuous H(x), 



x 



l | log{l-U)$(z,s,v)/${z,s,v + l) 
log z 



1 



where [•] signifies the integer part operation. 
(3) Finally, if s > 0, x is adjusted down sequentially, x — x — 1, until C/ > 
F(x — 1). If s < 0, we adjust up sequentially, x — x + 1, until C/ < 

Theorem 2 The expected number of computations, E(C), for the sequential 
search in Algorithm 2 is less than the expected number in the ordinary sequen- 
tial search, for all parameter values where the Lerch distribution is defined. 

Proof. For the ordinary sequential search the expected number of computa- 
tions is 

and for the inversion by correction method it is 
E 2 = 1 + E(x) = 1 + E \ F ( X ) ~ H ( x )\ = 1 + 

X 

J2T =0 z x+l \$(z, s,v + l)- s,v + x + l)\ 
$(z,s,v) 

UH>F, then 

^ ,,«, + i) g ^ = «(*...» + i) « (8) 

&(z,s,v) ^ &(z,s,v) l-z 

which is always a positive quantity, and thus E\ > E 2 . 
If if < F, then 

Ei - E 2 = f)(l - (2F(x) - ff(x))) = £(1 - G(x)) (9) 

a;=0 x=0 



which is an expectation of x with respect to the c.d.f. G(x) on a set of non- 
negative integers, and thus a positive quantity. Hence E 1 > E 2 . 

Here G(x) = 2F(x) — H{x) is a valid c.d.f. because G(x) is a nondecreasing 
function of x, G(0) = and \im x ^ oc G(x) = !.□ 
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Corollary 3 The difference between the expected numbers of computations, 
E(C), for the ordinary sequential search and the inversion by correction Al- 
gorithm 2 is less if s > than if s < 0. 

Proof. By using Equations (6) and (7) we rewrite Equation (9), which corre- 
sponds to s > 0, as 



${z,s,v + l) z 
$(z,s,v) 1-z 

The inequality is obtained by using the fact that for s > we have s, v + 
1) > s,v + x + 1). 

Finally, observe that the difference in Equation (10) is less than the difference 
in Equation (8).D 

Corollary 3 implies that the gain in efficiency by using inversion by correction 
vs. ordinary sequential search is diminished for s > 0; the gain is higher 
if s < 0. Obviously, Algorithm 2 can also be used for calculation of a gth 
quantile replacing U with q. 



3 Moments 



The moment generating function (m.g.f.) is defined by 



The factorial (descending) moment generating function (f.m.g.f.) is defined by 

G{l + t) = *W + 1) '? V) W 
$(z,s,v) 



and the factorial (descending) moments are the coefficients of the Taylor series 
of (12). 

The rth uncorrected moment can be obtained by differentiating r times the 
m.g.f. in (11) with respect to t and letting t — 0: 
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'd r G(e^ 



1 / cf$(e'z, s, v 



dt r 



1 ^ x r z x 



E 



®(z,s,v) ^ Q (x + v) 



:, r = l,2,... 



(13) 



\r-3 



By using the binomial expansion for x r = (x+v—v) r = Z^=n (j ) (x+vY(—v J 
and changing the order of summation ((Stuart and OrdL Il994[ ) we obtain fur- 
ther 



$(z, s, v) f^ Q \j 



£ r )(-v) r -t*(z,s-j,v) 



(14) 



Thus, the mean is given by 
$(z, s — l,v) 



$(z,S,v) 



— V 



(15) 



Similarly, the second and third uncorrected moments are given by 



/ 2 o 
IM 2 — V —IV 



<&(z,s — l,v) $(z,s — 2,v) 



mz,s,v) 



/i 3 = — v + 3f 



3v $(z,s-2,v) s - 3, u) 



$(2, s, f) $(2, s, u) 



(16) 



The central moments are obtained from the uncorrected moments using the 
following relationship ()Stuart and Ord . 1994): 



a*t = E(- 1 )*^JaC*(m'x)* 



(17) 



Substituting the uncorrected moments fi' r _ k from Equation (14) into Equa- 
tion (17) and rearranging terms we obtain 



k=0 j=0 



J 



v y-k-j ( sjl) k ^,s-j,v) 



Now exchange the order of the summations, use the relationship 
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and then collapse the resulting binomial expansion to obtain 

^=<^±yr>Q*(*,s-j,v)(v + M'-' (19) 



Finally, substituting the mean (15), we obtain 

= 7hT~~ V tW ( r ) *(*, s - j, v) ( * { *>;-\ v) ) ^ J (20) 



Thus, the variance is given by 

, 2 (-2fo + ;/)$(*, s-l,?;) + $(*,s- 2,?;)) 
a=^ = (T; + ^+ (21) 



Similarly, the third and fourth central moments are 



fi 3 = -(v + fif + — (3{v + fi) 2 <5>{z, s-l,v)- 

<3>(Z, S, V) v 

3(u + /i)$(-2, s - 2, u) + $(^, s - 3, u)) 

/x 4 = (u + /i) 4 + ^y- r (-4(u + /i) 3 $(^, s - 1, u)+ 

6(u + /J>) 2 $(z, s-2,v)- 4(u + /i)$(-2, s - 3, u)+ 

$(z, S -4,^)) (22) 



The ratios of some central moments are commonly used as indices for the 
shape of the distribution. Namely, skewness is defined as 

"3 = -172 ( 23 ) 



Kurtosis is defined as 

« 4 = £ (24) 



Skewness and kurtosis can be calculated by substituting the mean and the 
central moments from Equations (15), (21) and (22), respectively. 

The rth factorial (descending) moments can be obtained from the f.m.g.f. by 
differentiation r times with respect to t and letting t — 0: 
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'd r G(i + ty 

dt T 



1 ~ jWf 
&(z,s,v) ^ (x + vY 



t=0 $(z,s,v) 



'd r <S>((l+t)z,s,vY 
dt T 



t=o 



r = l,2,... 



(25) 



where = x(x — 1) . . . (x — r + 1) is the descending factorial. By inverting 
Newton's difference formula for the power functio n x r , the descending facto- 
rials can be expressed as functions of powers of x ().Tohnson et allll992h : 



i=0 



(26) 



where s(r, i) are the Stirling numbers of the first kind ( Goldberg et al. . 19761 ). 
Substituting Equation (26) into Equation (25) and changing the order of sum- 
mation we obtain 



x x l Z X 



i=0 



x=0 



(x + vY 



r = l,2, 



(27) 



i=0 



which relates factorial moments to the uncorrected ones. Thus, the first three 
factorial (descending) moments are 



^[l] — A*! 



$(z, s — l,v) 



<$>{z,s,v) 

V[2]=V2 "Ml = U(V + 1) 



/xj 3] = // 3 - 3/4 + 2^ = + l)( v + 2) + 



$(^s-M) (2i; + 1) , $(*,s-2,i;) 



mz,s,v) 



$(z,s - l,v) 
$(z,s,v) 

$(z, s - 3,i;) 
$(z,s,u) 



(3v 2 + 6v + 2) - 3 



s - 2,i;) 
$(z,s,u) 



(u + 1) + 



(28) 



4 Truncated Lerch distribution 



The Lerch distribution can be considered in singly or doubly truncated forms, 
which will change the summation limits in the normalization equation 

00 

Y,Pr(X = x) = l (29) 
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To calculate the normalization constant in these cases, we make use of the 



functional relation ( Magnus et al. . 1966): 



771 — 1 



$(z, s, v) = z m §(z, s,m + v) + 



o ( x + V Y 



(30) 



If the truncation points are a > and a < b < oo, the normalization constant 
is 



-i 



E 



z 



z a $(z, s,v + a)- z b+l ®(z, s, v + b + 1) 



a (V + XY 



(31) 



The requirement v > for the distribution to be proper is replaced with 
v > —a for a truncated form of the distribution. 



For example, in the zero-truncated case we have a = 1 and b = oo and 
cr 1 = zQ(z,s,v + 1). This expression ha s to be contrasted wi t h the one im- 



mediately following Equation (2.1b) in ( Zornig and Altmann . 1995) and all 
other equations in which th ey use the symbol $. One is cautioned that in 
f Zornig and Altmann . Il995h . the symbol $ is referred to a s Lerch's tran- 
scendent (see l^ation (LI) in Jzormg ,r,d A WtJ S), but it lacks 
the first term of the infinite series and is thus not the correct definition of 
Lerch's transcendent accord ing to Equation (3). However, the calculations in 
f Zornig and Altmann . 1995f ) will be correct if one uses direct summation in- 
stead of Lerch's transcendent $ as implemented in various software systems. 



The c.d.f. is 



F(x) = c £ 



z a $(z, s,v + a) - z x+1 $(z, s, v + x + 1) 
z a $(z, s,v + a) — z b+1 §(z, s, v + b + 1) 



(32) 



where we used Equation (31) for the normalization constant. 
The survival function is 



S(x) = z 



z x $(z, s, v + x + 1) - z b §(z, s, v + b + 1) 
z a &(z, s,v + a) - z b+1 &(z, s, v + b + 1) 



(33) 



The hazard function is 



h(x) = (z(v + x) s (®(z, s, v + x + 1) - z h x ®(z, s, v + b + 1 



(34) 
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The p.g.f. is given by 

= (yz) a §{yz, s,v + a)- (yz) b+1 $(yz, s,v + b + l) 
z a <S>(z, s,v + a) - z b+1 <$>(z,s,v + b+ 1) 



For the zero-truncated case we obtain 
y$(yz,s,v + 1) 

= *(Z,8,V + 1) (36) 



which should be contrasted with Equation (2.2) in ([Zornig and Altmann 



19951 ). where again their $ is not the correct Lerch's transcendent of Equa- 



tion (3). 

Uncorrected moments are given by 



1 



! z a ®(z, s,v + a) - z b+1 $(z, s, v + b + 1 



£ . \(-vy-i(z a <S>(z,s-j,v + a)- 
j=0 \JJ 

z b+1 $(z,s-j,v + b+l)) (37) 
and central moments are given by 



z a §(z, s,v + a) — z b+1 §(z, s, v + b + 1) 

\r-j ( r \ (T.afh(~ B _ A „, i ^ _ v b+l 



J2(-l) r ~ J . {z a ${z, s-j,v + a)- z b+1 $(z, s-j,v + b + l))x 



3=0 



( z a $(z,s- l,v + a) -z b+1 <5>{z,s - l,v + b+l) Y 3 

\ z a $(z,s,v + a) - z b+1 ®(z,s,v + b + l) J ^ ' 

For example, in the zero-truncated case one obtains uncorrected moments as 

1 A /r N 



and central moments as 
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j=0 



Equa tion (40) has to be contrasted with Equation (3.1) in (|Zornig and Altmann . 



19951 ). where the running index j incorrectly starts from 1 and $ is again not 



the correct Lerch's transcendent of Equation (3). 
5 Parameter estimation 

Suppose we are given a set of nonnegtaive integer-valued data 

{xi}, i = 1, ... ,ra (41) 

Sample data may also be given as a vector of empirical frequences, 

{fj = #(x = xj)/n}, xj =0,1,... (42) 



where # signifies the number of occurences. Here we consider two methods 
for fitting the Lerch distribution to data given as (41) or (42), the moment 
and maximum likelihood methods. We implemented both methods in a Mathe- 



matica package LerchDistribution.m which is available for download ([Aksenov . 



2004). 



5. 1 Method of moments 



The moment method (MM) for the Lerch distribution involves equating the 
first three uncorrected moments (15) and (16) [or central moments (21) and (22), 
or factorial moments (28)] to their sample counterparts 



/ —1 \ ^ T 

m r = n } t Xj 
i=i 


= £ Ir'-'j 

j 




n 

m r = n~ 1 ^2{xi 
i=i 


- m 'iY = J2fi( x j- m 'iY 
j 




n 

i -1 [r 
m [r] = U l^ X i 
i=l 


3 


(43) 



for uncorrected, central and factorial moments, respectively. 
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For example, equating uncorrected moments gives the following set of equa- 
tions for estimators z, s and v (equating central or factorial moments can be 
done in a similar way): 



, ${z,s-l,v) _ 

m 1 = v 

$(z,s,v) 



/ -2 n.-M Z > 8 -^ V ) i $(Z,S-2,V) 



m 9 = v — 2v , _ _. h 



<&(z,s,v) $(z,s,v) 
[z,s-l,v) _ 



. n ~$(z,s — l,v) $(z,s — 2,v) 

m' = -v 3 + 3v 2 y ; ' - 3v y \ J + 



$(z, s — 3,v) 



(44) 



Upon substituting values of sample moments, Equations (44) can be solved 
numerically to obtain MM estimates z, s and v. 

The approximate asymptotic variance-c ovariance matrix for MM estimates 
can be obtained using the delta method ( Stuart and Or in which one 

approximates the right-hand sides of Equations (44), fj,' r (9) where 6 = (z, s, v), 
by the first terms of Taylor series expansion. Then after applying the variance 
operator we have: 

varU) = j:(^f] 2 var(e j ) + j: £ ^M^ft) (45) 



Similarly, the covariance of population moments fi' r (9) is 

^(K,^) = E##^) + E E ww cov{6kA) (46) 

j=l j j k=l l=l,ljtk k ' 



To obtain equations for the variances and covarianc es of 6 we now calcula te 
the variance and covariance of sample moments m' r ( Stuart and Or 

3,G9) 



var[m r . 



-(/4r 

n 



fi' r ), cov(fi' r ,fi') = -Qjl' 



n 



+q 



(47) 



(which is an exact result) and equate them to the variance and covariance of 
population moments (45) and (46). As a result we obtain the following matrix 
equation 

Vl = H ■ Vj (48) 
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The sought Vj is defined as 

V e = (var(9 1 ),var(9 2 ),var(9 3 ),cov(9 1 ,9 2 ),cov(9 1 ,9 3 ),cov(9 2 ,9 3 )) (49) 



where superscript T signifies transposition. The vector of variances and co- 
variances of sample moments (using (47)) is given by: 



( var{m' l ) \ 
var(m' 2 ) 



nV^ = n 



varyrrin 



covlrrii, m' 2 ) 
cov(m' 1 , 777.3) 
\cov(m' 2 ,m' 3 ) J 



1 vk - M 2 x 
n' A - fi'i 



/4 



^4 - 

\/4 -// 2 // 3 / 



(50) 



and can be calculated using formulas for uncorrected moments in terms of 
Lerch's transcendent (14) (not shown). The remaining component of Equa- 
tion (48) is the matrix H, given by 



H 



H 2 U 

H 21 



H 12 
H 22 
H 32 



H 13 



H 23 
H 33 



H11H21 Hi 2 H 2 2 

H\\H 3 i H12H32 

\H21H31 H 2 2H 32 

2 if n if 13 

2 if 21 #23 
2ff 31 if 33 

HnH 23 + Hi 3 H 2 i 
H\\H 33 + Hi 3 H 31 
H2iH 33 + H2 3 H 3 i 

The elements of H 



2H U H 12 
2 if 21 if 22 
2 if 31 if 32 
H\ 3 H 23 H n H 2 2 + H 12 H 2 i 
H\ 3 H 33 H U H 32 + H 12 H 31 
H2 3 H 33 H 2 iH 3 2 + H 2 2 H 3 i 

2H 12 H 13 

2/^22-^23 

2H 3 2H 33 

ff 12^23 + Hi 3 H 2 2 
H± 2 H 33 + H\ 3 H 3 2 
H 2 2H 33 + H 23 H 3 2 J 



89, 



(51) 



(52) 
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can be expressed in terms of Lerch's transcendent (Appendix A). 

Finally, the vector of variances and covariances of estimates 6 = (z, s, v) is 
obtained as the solution of Equation (48) by using (50), (51) and (A. 2): 

Vj = H 1 ■ Y T m (53) 



5.2 Method of maximum likelihood 



Maximum likelihood (ML) estimators for the Lerch distribution can be ob- 
tained in a relatively straightforward way. The likelihood function is 

iw= iH* = * )= ^w <54> 

It is more convenient to work with the log-likelihood function which is obtained 
by taking the logarithm of (54) 

n n 

log L{9) = —n log &(z, s,v) + log z^Xi — log(v + Xi) (55) 



ML estimates are obtained by maximizing the likelihood function L(6) (54) 
or equivalently minimizing the log-likelihood log L(9) (55), which is done by 
solving the following system of equations for derivatives 



dlogL _ w dlog$ | E? =1 Si _ 
dz dz z 

,91ogL = n dlog§ " 1 =Q 
<9t> <9t> ~[v + Xi 

Substituting the partial derivatives of Lerch's transcendent from Appendix A 
into Equations (56) we obtain the following system of equations 



n 



_„ _ ^(z,s- M + l) - v$(z,s,v + l) 

i=i 



' ' &(z, s,v) 



oo log(-0+x)z 3: 
:=0 (v+x) s 



71-1 S lo s(^ + x i) = Z /j lo S(^ + x i) = " 
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n 



i=0 



1 

V + Xi 



fi 



j v + x i 



$(z,s,v) 



(57) 



To obtain the ML estimates z, s and v we have to solve numerically Equa- 
tions (57). 

Asymptotic variances and covariances of the ML estimates (49) are obtained 
as the inverse of Fisher's information matrix whose ijth element is the ex- 
pectation of the second partial derivative of the log-likelihood function with 
respect to the parameters, evaluated at the ML estimates: 



nL 



E 



d 2 log L(ey 

dOidO i 



(5* 



The information matrix is then defined as 



1 1 I I ^ 

l\\ J-12 -(13 



n 



1 12 1 22 123 

113 I23 



(59) 



The elements (58) of the information matrix can be obtained by straightfor- 
ward differentiation and expressed in terms of Lerch's transcendent (Appendix 
A). 

Finally, the elements of the vector of variances and covariances Vg (49) can 
be arranged as a variance-covariance matrix 



^ var(z) cov(z, s) cov(z,v)^ 



cov{z,s) var(s) cov{s,v) 
cov(z,v) cov(s,v) var(v) 



(60) 



which is calculated using Equations (59) and (A. 3): 
V = I 1 



(61) 



With both fitting methods, goodness-of-flt can be determined by using the 
classical Pearson Chi-squared statistic X 2 = nY^fLiifj ~Pj) 2 /Pj^ where M is 
the number of observed classes, fj is the observed frequ ency for the ?'th class, 
and Pj(0) is the fitted frequency (p.m.f.) for the jih class ((D'Agostino and Stephensl . 
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1986]). Even though we are really testing the composite hypothesis (we esti- 
mate parameters from data), asymptotically X 2 is distributed as % 2 (M— p— 1), 
where p = 3 is the number of parameters estimated. Minimization of the 
Chi-squared statistic can also be used to estimate parameters of the Lerch 
distribution. 



6 Examples 



In this section we consider several examples of fitting the Lerch distribution 
to over- and under-dispersed data that arise in a variety of counting pro- 
cesses, and to rank-abundance ecological data. These data sets have been 
modeled using Poisson, Poisson mixtures, generalized Poisson, adjusted gen- 
eralized Poisson, and Zipf- Mandelbrot distributions. Our results show that the 
Lerch distribution provides better goodness-of-fit and in two cases a successful 
fit when the alternatives fail to fit in the whole range of data. 



The first data set that we analyze is the overdispersed data on the numbers of 
sowbugs Trachelipus rathk ei found un der boards (columns 1 and 2 in Table 1). 
The data were obtained by Cold ( 1946h in studies of the distribution of different 
cryptozoa species within areas of their natural habitat. Cole found that the 
distribution of spiders under wooden boards scattered in the area follows a 
Poisson distribution, which indicates a random distribution of individuals and 
their unsocial behavior. In contrast, the distribution of sowbugs exhibited 
properties of the contagious distrib ution and could not be fit with a Poisson 
distribution. Janardan et al. ( 1979f ) successfully fit the sowbu g data set with 
the generalized Poisson model that has the p.f. (Consul, 19891 ) 



x — 1 „— 6— x\ 



Pa 



,x = 0,l,2,... 



x > 



when A < 



(62) 



where [•] signifies the integer part operation, with parameters 9 = 1.5416 and 
A = 0.5321 (see column 3 in Table 1 for calculated frequencies). We grouped 
classes 6 and 7, 8 and 9, 10 and 11, and 12 through 17. With this grouping 
we obtained the Pearson statistic X 2 = 9.3089 1 , with a p-value of 0.231232 
for 7 degrees of freedom (d.f.). 

Here we fit this data set with the Lerch distribution. To fit the Lerch distri- 
bution, we use the same grouping, and minimize Pearson X 2 statistic. The 
best-fit parameters are z = 0.913315, s = 2.37621, v = 9.63785, and the 
achieved minimum X 2 = 7.69169 (see column 4 in Table 1 for calculated 

1 All numbers in this section have been rounded to five significant digits 
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Counts Observed Gen. Poisson Lerch 






28 


26.1127 


29.2839 


1 


28 


23.6448 


21.153 


2 


14 


18.095 


15.6055 


3 


11 


13.3871 


11.7173 


4 


8 


9.86865 


8.93021 


5 


11 


7.31252 


6.89379 


6 


2 


5.46003 


5.38124 


7 


3 


4.10969 


4.24166 


8 


3 


3.11713 


3.37228 


9 


3 


2.38107 


2.70168 


10 


3 


1.83053 


2.1791 


11 


2 


1.41548 


1.76882 


12 





1.10029 


1.44369 


13 


1 


0.859339 


1.18433 


14 


2 


0.67404 


0.976077 


15 


1 


0.530761 


0.807877 


16 





0.419425 


0.671287 


17 


2 


0.332522 


0.559812 



Table 1 

Distribution of sowbugs under boards. The observations are frequencies of sow- 
bugs per board, frequency class e s are numbers of sowbugs per board. Observed 
frequencies are taken from dColel . ll946h . Predicted frequences come from the best- 
fit generalized Poisson and Lerch distributions. See text for details. 

frequencies). The p-value is 0.261572 for 6 d.f. We see that both generalized 
Poisson and Lerch fits are acceptable and goodness-of-fit is better with the 
Lerch distribution. 



Another way to model data that is over- or under-dispersed is to use an ad- 
j usted distribution that would correct for larger or smaller number of zeros 
f)Gupta et al.l ll996). Consider data on the numbers of death notices for women 
80 years and older that had been published in the London "Times" ev e ry da y 
for three consecutive years (columns 1 and 2 in Table 2) (|Hasselbladl . 1969). 
The data were examined for differences of death rates between winter and 
summer mont hs and thus were o riginally fitted by a mixture of two Poisson 
distributions. Gupta et al. ( 19961 ) observed that the data set might be zero 
deflated and fit an adjusted generalized Poisson model in which they intro- 
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Counts Observed Adj. Gen. Poisson Lerch 






162 


162.004 


161.906 


1 


267 


264.773 


266.73 


2 


271 


268.751 


264.789 


3 


185 


194.406 


192.091 


4 


111 


112.384 


112.56 


5 


61 


55.2168 


56.4979 


6 


27 


23.9531 


25.217 


7 


8 


9.4128 


10.2649 


8 


3 


3.41263 


3.87964 


9 


1 


1.15712 


1.37948 



Table 2 

Distribution of death notices in the London "Times" . The observations are frequen- 
cies of death notices published per day, frequency classes are numbers of n otices 
published per day. Observed frequencies are taken from ( Hasselblad . 1969h . Pre- 



dicted frequences come from the best-fit adjusted generalized Poisson and Lerch 
distributions. See text for details. 

duced an additional parameter cj> to describe fewer zeros than predicted by the 
generalized Poisson: 

r >+(l-0)e- e , x = 



P.m.f. (62) is given here in a so-called restricted form ((Consul LL982) as m 
( Gupta et all |l996) . The best-fit parameters of the adjusted generalized Pois- 
son model are 9 = 2.038 2 , a = 0.03639 and <fi = 0.02015 (see column 3 in 
Table 2 for calculated frequencies). The Pearson statistic is X 2 = 1.9379, with 
a p- value of 0.7472 for 4 d.f. 



Since the Lerch distribution accounts for the zero class naturally, we fitted it 
to the death notices data by minimizing the Pearson X 2 statistic. The best-fit 
parameters are z = 0.189628, s = —7.10717, v = 2.81275 (see column 4 in Ta- 
ble 2 for calculated frequencies). Minimum Pearson statistic is X 2 = 1.23938, 
with a p- va lue of 0.871573 for 4 d.f. Note that since th e last three classes were 
grouped in ( Hasselbladl . Il969| ) and ( Gupta et al. . 1996| ). we grouped them here 
as well to fit the Lerch distribution and calculate the minimum X 2 . Goodness- 



2 He re we corrected an apparent misprint in Table 3 and Figure 4 of ( Gupta et al 
1996) that read 9 = 1.2038 
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Counts 12 3 

Observed 5 68 88 32 

Gen. Poisson 8.67105 57.5966 97.4296 29.5181 
Lerch 2.79364 67.0555 93.2649 26.8808 

Table 3 

Distribution of bean weevil eggs on beans. The observations are frequencies of eggs 
oviposited by bean weevil per bean, frequenc y classes are num bers of eggs found on 
a bean. Observed frequencies are taken from ( Mitchell . Il975l ) . Predicted frequences 



come from the best- fit generalized Poisson and Lerch distributions. See text for 
details. 

of-fit is better with the Lerch distribution than with the adjusted generalized 
Poisson model. 

Now we turn to underdispersed data on the numbers of eggs oviposited by 
bean weevil Ca llosobrachus maculatus on beans (rows 1 and 2 in Table 3) 
f)Mitchelll . E975h . The data were obtained in studies of oviposition tactics by 



bean weevils with the mean number of eggs per bean 1.8. Again, a simple 
Poisson distribution did not fit well, suggesting that egged an d unegged beans 



are no t equally attractive for weevils seeking to oviposit an egg. I.Tanardan et al 



f 1979f ) fit the generalized Poisson distribution with (62) to the weevil dat 



with parameters 9 = 3.1027 and A = —0.7612 (see row 3 in Table 3 for 
calculated frequencies) . For comparison with the Lerch distribution, for which 
we would have zero d.f. of the Chi-squared distribution, we use the sum of 
squared deviations between the p.f. and empirical frequencies as a goodness- 
of-fit measure. For the generalized Poisson distribution the sum is 0.00581991. 

We fit the Lerch distribution to the weevil data by minimizing the Pearson 
X 2 statistic. The best-fit parameters are z = 0.00116201, s = -24.9577, 
v = 2.04499 (see row 4 in Table 3 for calculated frequencies). The sum of 
squared deviations is 0.00160233, and the goodness-of-fit is again better with 
the Lerch distribution. 

Finally, we consider rank-abundance dat a on biotic compartments in Lake 
Yunoko, Japan (rows 1 and 2 in Table 4) (|Aochil . ll995l ). The Zipf- Mandelbrot 



distribution as a model for f r equenc i es of r anked species was suggested for use 
in ecology bv lFrontier ( 19851 ). Aochi (|l995l ) observed that neither Zipf nor Zipf- 



Mandelbrot models can fit the whole range of data and suggested disregarding 
the most abundant s pecies (fish) from the data, which could then be fit with 
the Zipf distribution. lAochi (1995) gives the anthropogenic source of fish in the 



lake as a reason for disregarding the one data point. However, there are reasons 
to believe that different biotic compartments influence each other and, in 
particular the disregarded fish, might influence abundances of phytoplankton 
and other compartments. Thus, it is of interest to fit the whole range of data. 
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Rank 1 2 3 4 5 6 

Observed 0.46798 0.428571 0.0738916 0.0152709 0.00837438 0.00591133 

Lerch 0.460902 0.404575 0.102876 0.0245955 0.00573359 0.00131821 
Table 4 

Distribution of ranked biotic compartments in Lake Yunoko. The observations are 
relative abundances of the biotic compartments, classes are ranks of compart ments 
(rank one is the most abundant, etc.) Observed frequencies are taken from ( Aochil . 
1995). Predicted frequences come from the best-fit Lerch distribution. See text for 
details. 



The Lerch distribution provides a suitable model, being a generalization of the 
Zipf-Mandelbrot distribution. There are only six classes in the data starting 
with one. The observed abundancies are calculated from the da ta on standin g 
crop in compartments, expressed in grams of carbon per m 2 ( Aochi . Il995h . 
by dividing these values by the total crop. We fit the doubly truncated Lerch 
distribution by minimizing the Pearson X 2 statistic. The best-fit parameters 
are z = 0.219158, s = -0.214704, v = -0.998437 (see row 3 in Table 4 for 
calculated frequencies); the Pearson statistic is X 2 = 0.0259897, with a p- value 
of 0.987089 for 2 d.f. The fit is obviously very good. 



In summary, the above examples demonstrate that the Lerch distribution 
can provide a useful model with better goodness-of-fit for over- and under- 
dispersed data, for which modifications of the Poisson distribution or the 
reduced Lerch (Good) model perform less well or even fail, as well as for 
some rank-abundance ecological data, for which the reduced Lerch (Zipf- 
Mandelbrot) model does not fit. The ultimate reason is that the Lerch dis- 
tribution is a generalization of the Good and Zipf-Mandelbrot distributions 
(which were originally proposed as models for these problems) and is endowed 
with greater flexibility to accomodate the odd distributional shapes often ob- 
served in real problems. A comment on the method of parameter estimation is 
in order. We found in practice that the most convenient and accurate method 
of parameter estimation is minimization of the Pearson X 2 statistic, which 
works well for data even with a few frequency classes. The moment and max- 
imum likelihood methods are more difficult to use in such situations because 
of higher variance of sample moments and expectations in a small sample set- 
ting. An added difficulty is the slow convergence of MM and ML equations 
that involve Lerch's transcendent. These difficulties are not specific to the 
Lerch distribution and are in fact common for other distributions that are 
based on special functions. However, we found that these methods perform 
well with more rich data sets (not shown here). 
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A Appendix: variance-covariance matrices for method of moments 
and method of maximum likelihood estimators 



To calculate the elements of matrices H and I, we need the derivatives of 
Lerch's transcendent, which can be calculated by straightforward differentia- 
tion term by term: 



d ^ Z ^ v) = s - 1, v + 1) - v$(z, s,v + l) 



d$(z, s, v) _ ^ log(v + n)z n 

ds ( v + n Y 

d&(z, s, v 



dv 



= -s$(z,s + l,v) (A.l) 



Note that the infinite sum in the second equation (A.l) converges, because 
log(t> + x) grows slower than (v + x), which converges to $(z, s — l,v). 

Then the quantities Hij (52), which are the elements of H (51) are calculated 
as follows 



H u = <5>~ 2 (z, s, v)($(z, s-2,v + l)$(z, s, v) - 

<&{z, s — l,v + l)®(z, s — l,v) + v($(z, s,v + l)$(z, s - 1, v) 
$(z,s- l,v + l)$(z,s,v))) 

log(f + n)z n 



H 12 = $- 2 (z,s,v)[®(z,s-l,v)J2 



log(f + n)z n \ 



His = $~ 2 (z, s, v)({l - s)$ 2 (z, s, v) + s$(z, s - 1, v)$(z, s + l,v))-l 
H 2 i = -2vH 11 + $- 2 (z,s,v)(<S>(z,s-3,v + l)<S>(z,s,v) - 

$(z, s -l,v + l)$(z, s - 2, v) + v($(z, s,v + 1)<&(z, s - 2, v) - 

<&{z,s-2,v + l)<k(z,s,v))) 

Un = ~2vH 12 + «L 2 (z, s, v) U(z, s - 2, v) £ " ' : " 



n=0 \ V + U ) 



~ log(u + n)z r 

n=0 



(v + n) s 2 



${z,s,v) 

&~ 2 (z, s,v)((2 - s)&(z, s - l,v)$(z, s,v) + 
s$(z, s + 1, v)$(z, s - 2, v)) 
H 31 = -3v 2 H u - 3vH 21 + $- 2 (z, s, v)($(z, s - 4, v + l)$(z, s, v) - 
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$(z, s - 1, v + l)$(z, s -3,v) + v($(z, s, v + i)$(z, s - 3, v) - 
<&(z,s-Z,v + l)<k(z,s,v))) 

H 32 = -3v 2 H 12 - 3vH 22 + $ 2 (z, 5, «) (V- s - 3, „) £ ,f?{ " " ) "" 



=o (v + n) 



~ l g(u + n)z r 



#33 = -3f #13 - 3vH 23 - 3—- r h 

$(Z,S,V) 

$~ 2 (-2, s, v)((3 - s)$(z, s-2, v)$(z, s, v) + 

s$(z,s + l,v)®(z,s-3,v)) (A.2) 



The elements (58) of the information matrix (59) are calculated as follows 

I u = <$>-\z, s, v)($(z, s, v)($(z, s-2,v + 2)- 

(2v + l)$(z, s-l,v + 2)+v(v + l)$(s, z, v + 2)) - 

(*(,,. -I,,) -,*(,,., „+!))') + 

Jl2 = $ _2 (2, S,f) (($(z,S - 1,U + 1) - S,U + 1))X 

^ logjv + n)z n ^ \og(v + n + l)(n + l)z n \ 

S ~~ $( ^' s v) S ^TiT J 

il 3 = S$ 2 (-2, S,u)($(z, S,u)(u$(z, S + 1,V+ 1) - $(-2, S,V+ 1)) + 
s + 1, v)($(z, S - 1, V + 1) - U<&(z, S, V + 1))) 

r *-2/ / 1 / N ^ \og\v + n)z n 

hi = **(z,s,v) «)E (v + w) . - 

/~ log^ + n)^ y' 
(*; + n)* J 

J 23 = $ \z,s,v) l$(z,s,v) 

V n=0 



(v + n) s+l 



s*(z s + l v)Y ^(v + n)z n \ Hz,s + l,v) 

J 33 = s 2 $- 2 (z, s, v)($(z, s, v)$(z, s + 2,v)- $ 2 (z, s + 1, u)) (A.3) 
When calculating expectations (58), we use the following asymptotic results: 



\ n i=i J 



nf^v + XiJ $(z,s,v) 
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(v + Xi) 2 



1 



$(z,s + 2,v) 



(A.4) 
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